business scenario
ECom-Bench: Can LLM Agent Resolve Real-World E-commerce Customer Support Issues?
Wang, Haoxin, Peng, Xianhan, Huang, Xucheng, Huang, Yizhe, Gong, Ming, Yang, Chenghan, Liu, Yang, Jiang, Ling
In this paper, we introduce ECom-Bench, the first benchmark framework for evaluating LLM agent with multimodal capabilities in the e-commerce customer support domain. ECom-Bench features dynamic user simulation based on persona information collected from real e-commerce customer interactions and a realistic task dataset derived from authentic e-commerce dialogues. These tasks, covering a wide range of business scenarios, are designed to reflect real-world complexities, making ECom-Bench highly challenging. For instance, even advanced models like GPT-4o achieve only a 10-20% pass^3 metric in our benchmark, highlighting the substantial difficulties posed by complex e-commerce scenarios. The code and data have been made publicly available at https://github.com/XiaoduoAILab/ECom-Bench to facilitate further research and development in this domain.
- North America > United States > Florida > Miami-Dade County > Miami (0.04)
- Europe > Czechia > Prague (0.04)
- Europe > Belgium > Brussels-Capital Region > Brussels (0.04)
- Asia > China > Shanghai > Shanghai (0.04)
FinGAIA: A Chinese Benchmark for AI Agents in Real-World Financial Domain
Zeng, Lingfeng, Lou, Fangqi, Wang, Zixuan, Xu, Jiajie, Niu, Jinyi, Li, Mengping, Dong, Yifan, Qi, Qi, Zhang, Wei, Yang, Ziwei, Han, Jun, Feng, Ruilun, Hu, Ruiqi, Zhang, Lejie, Feng, Zhengbo, Ren, Yicheng, Guo, Xin, Liu, Zhaowei, Cheng, Dongpo, Cai, Weige, Zhang, Liwen
The booming development of AI agents presents unprecedented opportunities for automating complex tasks across various domains. However, their multi-step, multi-tool collaboration capabilities in the financial sector remain underexplored. This paper introduces FinGAIA, an end-to-end benchmark designed to evaluate the practical abilities of AI agents in the financial domain. FinGAIA comprises 407 meticulously crafted tasks, spanning seven major financial sub-domains: securities, funds, banking, insurance, futures, trusts, and asset management. These tasks are organized into three hierarchical levels of scenario depth: basic business analysis, asset decision support, and strategic risk management. We evaluated 10 mainstream AI agents in a zero-shot setting. The best-performing agent, ChatGPT, achieved an overall accuracy of 48.9\%, which, while superior to non-professionals, still lags financial experts by over 35 percentage points. Error analysis has revealed five recurring failure patterns: Cross-modal Alignment Deficiency, Financial Terminological Bias, Operational Process Awareness Barrier, among others. These patterns point to crucial directions for future research. Our work provides the first agent benchmark closely related to the financial domain, aiming to objectively assess and promote the development of agents in this crucial field. Partial data is available at https://github.com/SUFE-AIFLM-Lab/FinGAIA.
Implement services for business scenarios by combining basic emulators
The Jiutian Intelligence Network Simulation Platform [1] decouples, encapsulates and interfaces the key modules of the wireless communication receiving end and transmitting end, and supports module replacement and combination research. Therefore, AI algorithm personnel as users can use the 5G + typical functional network element intelligence and new air interface simulation environment to efficiently and cost-effectively conduct design research and effect verification of new intelligent algorithms in a virtual environment. In order to allow users to understand and become familiar with the wireless simulation platform from scratch, especially for algorithm personnel who have no communication system background in the early stage, the smart network simulation platform has designed open tasks (such as multi-objective antenna optimization, high traffic business, CSI compression feedback, etc.) to guide users From familiar use to advanced verification functions.
- Telecommunications (0.55)
- Leisure & Entertainment (0.34)
- Information Technology > Networks (0.34)
Emulators in JINSP
Zhao, Lei, Zhang, Miaomiao, Zhe, Lv
JINSP(Jiutian Intelligence Network Simulation Platform) describes a series of basic emulators and their combinations, such as the simulation of the protocol stack for dynamic users in a real environment, which is composed of user behavior simulation, base station simulation, and terminal simulation. It is applied in specific business scenarios, such as multi-target antenna optimization, compression feedback, and so on. This paper provides detailed descriptions of each emulator and its combination based on this foundation, including the implementation process of the emulator, integration with the platform, experimental results, and other aspects.
- Information Technology > Data Science (1.00)
- Information Technology > Communications > Networks (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.49)
Measuring the Stability of Process Outcome Predictions in Online Settings
Lee, Suhwan, Comuzzi, Marco, Lu, Xixi, Reijers, Hajo A.
Predictive Process Monitoring aims to forecast the future progress of process instances using historical event data. As predictive process monitoring is increasingly applied in online settings to enable timely interventions, evaluating the performance of the underlying models becomes crucial for ensuring their consistency and reliability over time. This is especially important in high risk business scenarios where incorrect predictions may have severe consequences. However, predictive models are currently usually evaluated using a single, aggregated value or a time-series visualization, which makes it challenging to assess their performance and, specifically, their stability over time. This paper proposes an evaluation framework for assessing the stability of models for online predictive process monitoring. The framework introduces four performance meta-measures: the frequency of significant performance drops, the magnitude of such drops, the recovery rate, and the volatility of performance. To validate this framework, we applied it to two artificial and two real-world event logs. The results demonstrate that these meta-measures facilitate the comparison and selection of predictive models for different risk-taking scenarios. Such insights are of particular value to enhance decision-making in dynamic business environments.
- Europe > Netherlands (0.04)
- Asia > South Korea > Ulsan > Ulsan (0.04)
- Europe > Spain > Catalonia > Barcelona Province > Barcelona (0.04)
- Europe > Italy > Lazio > Rome (0.04)
AntM$^{2}$C: A Large Scale Dataset For Multi-Scenario Multi-Modal CTR Prediction
Huan, Zhaoxin, Ding, Ke, Li, Ang, Zhang, Xiaolu, Min, Xu, He, Yong, Zhang, Liang, Zhou, Jun, Mo, Linjian, Gu, Jinjie, Liu, Zhongyi, Zhong, Wenliang, Zhang, Guannan
Click-through rate (CTR) prediction is a crucial issue in recommendation systems. There has been an emergence of various public CTR datasets. However, existing datasets primarily suffer from the following limitations. Firstly, users generally click different types of items from multiple scenarios, and modeling from multiple scenarios can provide a more comprehensive understanding of users. Existing datasets only include data for the same type of items from a single scenario. Secondly, multi-modal features are essential in multi-scenario prediction as they address the issue of inconsistent ID encoding between different scenarios. The existing datasets are based on ID features and lack multi-modal features. Third, a large-scale dataset can provide a more reliable evaluation of models, fully reflecting the performance differences between models. The scale of existing datasets is around 100 million, which is relatively small compared to the real-world CTR prediction. To address these limitations, we propose AntM$^{2}$C, a Multi-Scenario Multi-Modal CTR dataset based on industrial data from Alipay. Specifically, AntM$^{2}$C provides the following advantages: 1) It covers CTR data of 5 different types of items, providing insights into the preferences of users for different items, including advertisements, vouchers, mini-programs, contents, and videos. 2) Apart from ID-based features, AntM$^{2}$C also provides 2 multi-modal features, raw text and image features, which can effectively establish connections between items with different IDs. 3) AntM$^{2}$C provides 1 billion CTR data with 200 features, including 200 million users and 6 million items. It is currently the largest-scale CTR dataset available. Based on AntM$^{2}$C, we construct several typical CTR tasks and provide comparisons with baseline methods. The dataset homepage is available at https://www.atecup.cn/home.
- Asia > China > Zhejiang Province > Hangzhou (0.05)
- North America > United States > New York > New York County > New York City (0.04)
- North America > United States > District of Columbia > Washington (0.04)
- (5 more...)
- Information Technology (0.93)
- Marketing (0.68)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (0.67)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.48)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)
Recording of 50 Business Assignments
Sroka, Michal, Sani, Mohammadreza Fani
One of the main use cases of process mining is to discover and analyze how users follow business assignments, providing valuable insights into process efficiency and optimization. In this paper, we present a comprehensive dataset consisting of 50 real business processes. The dataset holds significant potential for research in various applications, including task mining and process automation which is a valuable resource for researchers and practitioners.
- North America > United States > Texas > Coleman County (0.04)
- Europe > Denmark > Capital Region > Copenhagen (0.04)
- Information Technology > Artificial Intelligence (0.70)
- Information Technology > Communications (0.69)
- Information Technology > Data Science (0.53)
Construction and Applications of Billion-Scale Pre-Trained Multimodal Business Knowledge Graph
Deng, Shumin, Wang, Chengming, Li, Zhoubo, Zhang, Ningyu, Dai, Zelin, Chen, Hehong, Xiong, Feiyu, Yan, Ming, Chen, Qiang, Chen, Mosha, Chen, Jiaoyan, Pan, Jeff Z., Hooi, Bryan, Chen, Huajun
Business Knowledge Graphs (KGs) are important to many enterprises today, providing factual knowledge and structured data that steer many products and make them more intelligent. Despite their promising benefits, building business KG necessitates solving prohibitive issues of deficient structure and multiple modalities. In this paper, we advance the understanding of the practical challenges related to building KG in non-trivial real-world systems. We introduce the process of building an open business knowledge graph (OpenBG) derived from a well-known enterprise, Alibaba Group. Specifically, we define a core ontology to cover various abstract products and consumption demands, with fine-grained taxonomy and multimodal facts in deployed applications. OpenBG is an open business KG of unprecedented scale: 2.6 billion triples with more than 88 million entities covering over 1 million core classes/concepts and 2,681 types of relations. We release all the open resources (OpenBG benchmarks) derived from it for the community and report experimental results of KG-centric tasks. We also run up an online competition based on OpenBG benchmarks, and has attracted thousands of teams. We further pre-train OpenBG and apply it to many KG- enhanced downstream tasks in business scenarios, demonstrating the effectiveness of billion-scale multimodal knowledge for e-commerce. All the resources with codes have been released at \url{https://github.com/OpenBGBenchmark/OpenBG}.
- Asia > Singapore (0.05)
- Asia > China > Zhejiang Province > Hangzhou (0.05)
- Asia > China > Heilongjiang Province > Harbin (0.04)
- (24 more...)
- Information Technology (0.68)
- Education (0.48)
The Top 5 Reasons Why Most AI Projects Fail - DataScienceCentral.com
Due to the pandemic, most businesses are increasing their investments in AI. Organizations have accelerated their AI efforts to ensure their business is not majorly affected by the current pandemic. Though the implementation is a positive development in terms of AI adoption, organizations need to be aware of the challenges in adopting AI. Building an AI system is not a simple task. It comes with challenges at every stage. Even though you build an AI project, there are high chances of it failing upon deployment, which can be attributed to numerous reasons.